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A new unbiased Monte Carlo technique called Tensor Network Monte Carlo (TNMC) is introduced based 
on sampling all possible renormalizations (or course-grainings) of tensor networks, in this case matrix-product 
states. Tensor networks are a natural language for expressing a wide range of discrete physical and statistical 
problems, such as classical and quantum systems on a lattice at thermal equilibrium. By simultaneously sam¬ 
pling multiple degrees of freedom associated with each bond of the tensor network (and its renormalized form), 
we can achieve unprecedented low levels of statistical fluctuations which simultaneously parallel the impressive 
accuracy scaling of tensor networks while avoiding completely the variational bias inherent to those techniques, 
even with small bond dimensions. The resulting technique is essentially an aggressive multi-sampling technique 
that can account for the great majority of the partition function in a single sample. The method is quite general 
and can be combined with a variety of tensor renormalization techniques appropriate to different geometries and 
dimensionalities. 


INTRODUCTION 

From the mid-1900’s onwards, we have seen numerical 
methods grow from a useful calculational tool into an en¬ 
tirely new way of performing science, somewhere between 
the traditional theoretical and experimental disciplines. Par¬ 
ticularly within physics, chemistry, mathematics and related 
fields, highly accurate numerical results derived from first 
principles can now often be compared favourably to experi¬ 
ment where (often perturbative) theoretical approaches fail — 
or even used in lieu of experiment to simulate theories under 
difficult to achieve conditions, such as modelling the internals 
of subatomic particles. 

Arguably one of the, if not the , most successful and 
pervasive numerical techniques has been the Monte Carlo 
method m, which uses statistical sampling to approximate 
complicated calculations. In their seminal 1953 paper ED. 
Metropolis et al boldly claim to introduce “a general method... 
of calculating the properties of any substance considered to be 
composed of individual interacting” classical particles. Since 
then, the Metropolis algorithm in particular and Monte Carlo 
techniques in general have been applied with acclaimed suc¬ 
cess to fields as far flung as finance, biology and engineering. 

In the intervening years, many new Monte Carlo algorithms 
have been introduced to extend the method and improve the 
accuracy obtained for a given amount of computational effort. 
Improvements in accuracy can roughly be separated into two 
categories: improving the generation of independent statisti¬ 
cal samples, and increasing the accuracy of a single sample. 
In the former, Markov chain algorithms such as the Metropo¬ 
lis algorithm generate new samples by incremental changes 
to the earlier samples, introducing a degree of autocorrelation 
between the samples. This is detrimental because the statisti¬ 
cal error A E of a measured quantity E generically decreases 
with the number of completely independent samples N as 

lvai\E] 

(1) 

Autocorrelation decreases the number of effectively indepen¬ 


dent samples, reducing the denominator. Important tech¬ 
niques to reduce autocorrelation include cluster updates to 
avoid the well-known slowing down in critical problems |3] 
|4), loop updates for quantum or topological problems J5] El, 
and worm algorithms to sample across off-diagonal quantum 
observables 0. 

The second class of improvements act to decrease the nu¬ 
merator in Eq. 0 by decreasing the sample-to-sample vari¬ 
ance of the observable E. Monte Carlo methods work to ap¬ 
proximate a very large sum over many variables according to 
their probabilities. Importance sampling, which modifies the 
probability distribution, is an effective way to “flatten” the dis¬ 
tribution of the summands so that each contribution has sim¬ 
ilar values. In practice, importance sampling can only reduce 
Var[2£] only up until a point, for instance to the variance ob¬ 
served in the ensemble of physical realizations of the system. 

A second path to reducing the variance is by, within each 
sample, explicitly summing a tractable fraction of the large 
sum that Monte Carlo is attempting to estimate. Sometimes 
this is achieved relatively trivially, by exploiting symmetries 
such as translational invariance, or it may involve moderately 
challenging calculations at each step to perform the partial 
summation. In any case, a single sample may be said to 
be effectively performing multiple samples of the sum — a 
process (when implemented explicitly) referred to as multi¬ 
sampling iU. In this work, a new, generic multi-sampling 
technique is introduced that can lead to massive reductions in 
Var[2?] for a wide range of classical and quantum statistical 
problems on a lattice. In the best-case scenario, the error may 
decrease exponentially with increasing computational effort, 
as opposed to the slower TV -1 / 2 scaling of traditional Monte 
Carlo. 

This new method is based on modern techniques for renor¬ 
malizing (by which I mean ‘approximately summing’) tensor 
networks. Tensor networks are an extremely powerful lan¬ 
guage for expressing problems (such as the partition function 
of classical or quantum systems) and methods for their con¬ 
traction (which is, for all intents and purposes, a sum over 
many terms). Tensor network calculations came to promi- 
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nence with the advent of White’s Density Matrix Renormal¬ 
ization Group (DMRG) in 1992 0, famous for its extraordi¬ 
nary accuracy in solving one-dimensional quantum systems, 
and which is intimately connected with a tensor decomposi¬ 
tion known as matrix product states (MPS) m or sometimes 
tensor trains. MPS and DMRG techniques can similarly be 
applied to the contraction of 2D lattices of tensors, by uti¬ 
lizing an MPS to represent a boundary state (ED or modify¬ 
ing the original DMRG algorithm to simultaneously approxi¬ 
mate the four quadrants of the lattice «m In all these cases, 
the techniques brought an unprecedented level of accuracy for 
simulating strongly-correlated systems. 

From here, tensor network research has been focused on 
two complimentary efforts. The first, which I call the ‘vari¬ 
ational’ approach, attempts to capture a representation of a 
(typically, quantum) state using a tensor network decompo¬ 
sition that is efficient to store and manipulate on a com¬ 
puter and which can be optimized variationally to obtain 
the state with lowest energy — the ground state. Examples 
of such states include MPS, tree tensor network states M 
HU, projected entangled-pair states (PEPS, for 2D quantum 
systems) mm, the multi-scale entanglement renormal¬ 
ization ansatz (MERA, which can represent scale-invariant 
states) fl8l 1T9) and branching unitary structures for metallic 
phases [2j); :2Tj]- In each of these, the ‘bond-dimension’ or di¬ 
mensionality of the indices in the tensor network state D is 
free to be determined by the programmer, resulting in a con¬ 
trol which can increase accuracy at the expense of increased 
computational cost. At the low-end of D = 1 we recover 
the set of product states, and thus mean-field theory, but as D 
increases the accuracy may increase rapidly, even exponen¬ 
tially. As a result, these variational approaches have proven 
very competitive, even for the study of challenging strongly- 
correlated (including fermionic) 2D quantum systems (e.g. 

S22H21). 

The second approach, which I call the ‘renormalization’ 
approach and was mentioned above, seeks not to approxi¬ 
mate the state or wavefunction of the system but rather di¬ 
rectly approximates the calculation in question, for example, 
the calculation of a partition function (and thus free energy) 
or another observable of a statistical problem which is natu¬ 
rally expressed as the contraction of a tensor network. Exam¬ 
ples include boundary-MPS, corner transfer matrix, the tensor 
renormalization group (TRG) of Levin & Nave lf25l and later 
‘higher-order’ tensor renormalization group (HOTRG) (26), 
as well as the recent (and broadly named) tensor network 
renormalization (TNR) of Evenbly & Vidal (27). In each 
method, one attempts to solve the system by successively ap¬ 
proximating larger and larger blocks. In each method, one 
has control of the bond-dimension D which represents num¬ 
ber of degrees of freedom retained in each course-grained (or 
‘renormalized’) block, while the remaining degrees of free¬ 
dom are discarded, introducing a degree of error. And like the 
variational approaches, in each method, the accuracy has been 
observed to improve extremely rapidly with D, by being able 
to account for the vast majority of the partition function. 


The marriage of tensor networks with Monte Carlo is an 
idea that has been attempted before. The appeal is obvious 
— the combination of the rapidly-improving accuracy of ten¬ 
sor networks with the speed, parallelism and unbiased nature 
of Monte Carlo would breed a best-of-both-worlds calcula¬ 
tion technique. Apart from Il28l (which introduces a varia¬ 
tional fixed-node or constrained-path approximation), the at¬ 
tempts to date have focussed on variational Monte Carlo, 
where Monte Carlo is used to sample and update variational 
tensor network states (2911341 . Unfortunately, optimizing a 
tensor network state with Monte Carlo estimates has proven 
challenging due to the combination of statistical errors with 
extremely stiff optimization problems. The resulting algo¬ 
rithm rather combined the worst aspects of variational states 
and large statistical errors! 

The silver-lining of these experiments has been the abil¬ 
ity to extract observables from tensor network states which 
would otherwise be too computationally demanding to calcu¬ 
late — for example the energy of PEPS with large bond di¬ 
mension ED- In fact, it partly is this limited success which 
has motivated this work. While optimization problems involv¬ 
ing many variables (thousands to millions) proved unstable to 
statistical errors, the calculation of a single number such as 
the norm of a wavefunction is a well-controlled problem. 

Thus we now turn to applying Monte Carlo to tensor renor¬ 
malization techniques where the goal is the extraction of a 
single (or just a few) numbers from the contraction of a large 
tensor network. In fact, standard Monte Carlo techniques are 
applied exactly in this way — by choosing a single configura¬ 
tion for each bond of the tensor network representation of the 
partition function, the result decomposes into a simple prod¬ 
uct of numbers. Further, we will see standard Monte Carlo is 
equivalent to the D = 1 version of the technique introduced 
here! 

Where I extend this technique is by multi-sampling: keep¬ 
ing D > 1 of the degrees of freedom associated with each 
bond. Normally, this would result in an intractable tensor net¬ 
work contraction, but the ideas of tensor renormalization are 
used to calculate each sample efficiently. At each iteration, 
course graining is performed, and a subspace of dimension D 
of the bonds associated with the enlarged block are selected, 
before repeating. Importance sampling is performed in an 
optimized basis, and degrees of freedom are selected accord¬ 
ing to their individual weights which are correlated with the 
magnitude of their contribution to the total partition function. 
What is observed is that each sample can account for the great 
majority of the partition function, and the sample-to-sample 
variance decreased extremely rapidly with increased D, re¬ 
sulting in errors bars of extremely small magnitude which are 
rare even in Monte Carlo studies of very simple systems. In 
the systems studied here, it was even possible to forgo Markov 
chain sampling and generate the samples directly (via ‘per¬ 
fect’ sampling), eliminating all autocorrelation effects. The 
Monte Carlo averages are unbiased and do not display the 
variational error typical to tensor network techniques. The 
method is seen to combine the best aspects of Monte Carlo and 
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tensor network calculations, and the improvement over stan¬ 
dard Monte Carlo has a certain analogy with the improvement 
of variational tensor network states over mean-field theory. 

The purpose of this paper is to outline this new Monte Carlo 
algorithm based on sampling over the space of tensor renor¬ 
malizations (or truncations, or projections) called Tensor Net¬ 
work Monte Carlo (TNMC), and to demonstrate numerically 
that (a) the TNMC algorithm inherits extremely small sample - 
to-sample variance from the legendary precision of tensor net¬ 
work methods and (b) the variational bias of the tensor net¬ 
work technique is removed by the unbiased sampling. The 
algorithms and example systems are chosen to be among the 
simplest possible for clarity: matrix-product states applied to 
integrable 2D classical systems at thermal equilibrium. How¬ 
ever, it should be kept in mind that the core of the Monte Carlo 
sampling algorithm can easily be applied to more advanced 
tensor network renormalization schemes, and that the tech¬ 
niques can be applied to both out-of-equilibrium and/or quan¬ 
tum problems in ^/-dimensions by contraction of a (d + 1)- 
dimensional tensor network, both of which are examined in 
the Discussion. 


MONTE CARLO ALGORITHM 
Boundary MPS calculations 

I begin by explaining a simple tensor network algorithm 
for contracting (approximately) a two-dimensional tensor net¬ 
work on a square lattice. This type of contraction problem 
occurs in many scenarios, including solving 2D classical sys¬ 
tems at thermal equilibrium and ID quantum lattice systems 
via the path-integral formulation, as well as certain statisti- 
cal/combinatorical problems. The core ideas can be applied to 
different lattice geometries and higher-dimensional problems. 

The tensor network to contract is shown in Fig. [T] a closed 
tensor network diagram (with no open indices) representing a 

single number which in all cases I call the partition function 
Z. I denote the dimension of the bonds as d and thus the rank- 

4 tensors in the bulk are d x d x d x d tensors. Contracting 
the tensor network and obtaining Z exactly has a computa¬ 
tional cost that scales exponentially with the lattice length L , 
as 0(d L ). To achieve this, the tensors in the first row could 
be multiplied with those in the second row, and then the third 
row, etc. At the ith step, the horizontal bond dimension has 
grown to d l . 

A more efficient, though approximate, way of contracting 
the tensor network is by truncating the horizontal bonds to 
some maximal dimension D. Specifically, in the renormal¬ 
ization schemes based on tensor networks, truncating a bond 
from dimension R to D is achieved by applying a pair of lin¬ 
ear maps Wl an d Wr from the R dimensional space to a I? 
dimensional subspace — these linear maps are simply Rx D 
matrices and, typically, the pair form a projector P = WlWJ i 
where P 2 * 4 = P and Wl is the quasi-inverse of Wj t (the pro¬ 


jector is said to be orthogonal when Wl = Wj j are isometric 
matrices). The scheme is depicted in Fig. [I] where Wl and 
Wr are drawn as pairs of triangular tensors. 

What remains is the determination of each of the Wl and 
Wr. The boundary-MPS algorithm relies on minimizing the 
error of the boundary state made up of the all the outgo¬ 
ing vertical bonds from the upper region, outlined in blue in 
Fig.[T](b). If the error is taken with respect to the 2-norm, then 
the optimal values of Wl and Wr can be determined directly. 
Take the state before truncation as |MPS) and after |MPS / ). 
We wish to solve the following minimization problem. 


min 

w L ,w R 


IMPS) - IMPS') 


2 


2 


( 2 ) 


This can be solved most readily by using the Schmidt de¬ 
composition of the state, 

R 

\MPS) =Y,S l \L i )®\R l ) (3) 

i=t 


where the Schmidt vectors are orthonormal ((Li\Lj) = Sij 
and (Ri\Rj) = 5^) and the Schmidt coefficients can be cho¬ 
sen real and non-negative (S t > 0). Any matrix-product state 
is readily put into this form, though to do so has a numeri¬ 
cal cost scaling as the cube of the bond-dimension and, this 
procedure contributes the leading-order cost of the entire cal¬ 
culation. Details of how to perform this transformation can be 
found in the Appendix. 

Once in this form, the optimal truncation is straightfor¬ 
ward: we simply select the D largest Schmidt coefficients 
Si,, Sr and discard the remainder. This is achieved by a 
projector which is diagonal in the Schmidt basis. In the com¬ 
mon case that the magnitude of the Schmidt coefficients de¬ 
creases rapidly, the approximation can be extremely accurate 
for moderate values of D. In fact, for wide classes of physi¬ 
cal systems it has been commonly observed that the error de¬ 
cays exponentially (or superpolynomially) with D, however 
this not true for critical systems, glassy systems, or systems of 
higher effective dimensionality. 


Tensor Network Monte Carlo 

The boundary-MPS version of Tensor Network Monte 
Carlo proceeds exactly as above, except uses a non- 
deterministic algorithm to obtain Wl and Wr at each step. 
Although the subspace chosen by the projector will not al¬ 
ways be “optimal” as above, this will allow us to, on average, 
take unbiased samples. 

They key to unbiased sampling to is to sum up all contri¬ 
butions to Z, big and small, in the limit of infinite samples. 
Given a set of possible samples S, sample s € S defines two 

(s) 

Rx D matrices W y L ’ and W R ' such that on average, 

±-ST W L )W R n = L 

1 1 se S 


( 4 ) 
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FIG. 1: (a) A two-dimensional partition function expressed as a tensor network. Boundary-MPS contraction proceeds by combining the top 
row of tensors with the next (red circles), and then (b) projecting the combined horizontal bonds to a subspace of maximal dimension D 
(triangular tensors). Typically, the projectors are chosen to maximize the fidelity of resulting effective state of the upper-half of the system, 
outlined in blue, (c) After repetition, the end result is a tensor network that is contractible with cost linear in system size. 



Graphically, this relation is: 

After sufficient sampling, the effect of the projectors is the 
identity operations — in other words, there is no effect on the 
calculated partition function. On average , Fig. |T| (a), (b) and 
(c) will give the same result! 

The choice of operators and ' that fulfils Eq. ([dji 
is not unique. The goal now is to take a selection which mini¬ 
mizes the expectation value of the error, so that the sample-to- 
sample variations are small. For instance, taking random pro¬ 
jectors (with appropriate scaling) may be a solution of Eq. Q, 
but it would be an extremely poor sampling scheme with un- 
workably large statistical variance, requiring a huge number 
of samples to achieve a reasonable accuracy. 

To improve the statistical variance, importance sampling is 
used. In analogy to the choice of optimal projectors above, I 
use the Schmidt decomposition to fix the sampling basis and 
use the Schmidt coefficients Ci to define the weights related 
to choosing D of the R possible basis vectors. Motivated 
by minimizing the expected 2-norm error [Eq. Q], I assign 
weight c 1 to the the probability of selecting the zth vector. 
The weight of simultaneously choosing vectors ii, i%,..., ijj 
is given by their product, 

D 

w(ii,i 2 ,...,iD) = (6 ) 

j =i 

The probability of a certain selection i = [i\,... ,in] is the 
normalized form of these weights, 

p(i) = iu(i)/^u>(i). (7) 

i 

This probability distribution is known in the literature as “Fis¬ 
cher’s multivariate, non-central hypergeometric distribution” 
where each element can be selected at most once. It is also 
related to the canonical Fermi-Dirac distribution (i.e. with 


fixed particle number, D). How to sample efficiently and di¬ 
rectly from this distribution is not immediately obvious, but 
it is possible to write the probability distribution itself as a 
simple matrix-product state from which perfect sampling can 
be performed (32) [35]. See the Appendix for details and the 
freely available sampling code. 

The advantage of this distribution is that the Schmidt coef¬ 
ficients with largest weights are almost-always selected, while 
the tails of the distribution are only selected occasionally. This 
gives us most of the accuracy of the simple-truncation ap¬ 
proach, while sampling removes the bias inherited from deter¬ 
ministically throwing away part of the boundary state. How¬ 
ever, to ensure the requirements of Eq. 0 are met, we must 
scale each component k by the inverse of its appearance rate 
l/r(fc), where 


r(k) = Xow w 


The values of r(k) are readily extracted from the matrix- 
product form of Fischer’s hypergeometric distribution. The 
resulting Wl and Wr do not form a projector; nonetheless, 
they are expected to lead to single-sample accuracies that are 
comparable to the standard truncation approach. The entire 
sampling cost scales linearly in the product DR and is not a 
leading-order cost of the simulation. 

It is essential that the projectors at different points in space 
are chosen independently. Because of this, TNMC is limited 
to simulations of finite systems, which is one drawback com¬ 
pared to variational tensor network approaches which may di¬ 
rectly address the thermodynamic limit. 

At the end of the calculation, an estimate of the partition 
function Z s is produced. To get an estimate of Z and of its 
statistical uncertainty, multiple samples should be taken to de¬ 
termine the mean and standard error. 
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FIG. 2: One of the approximately 7.05 x 10 56 loop configurations 
on a 16 x 16 lattice with open boundary conditions. 


NUMERICAL RESULTS 
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FIG. 3: Calculations of the partition function of the six-vertex model 
on a 16x16 lattice (where Z ex act ~ 7.05 x 10 56 ), using standard 
MPS truncation and 10 4 tensor network Monte Carlo samples with 
the given bond dimension D. The TNMC results are within one or 
two standard deviations to the exact results, independent of bond di¬ 
mension (the 0 = 1 result suffers from broad tails on the distribu¬ 
tion, due to our perfect sampling approach). In both cases, the errors 
are reduced with increasing bond dimension. 


Statistical error versus bond dimension 


The model: We first study the partition function of a sim¬ 
ple fully-packed loop model on a 2D lattice, which represents 
the counting of all possible closed-loop configurations where 
each vertex must have a single line entering and exiting, rep¬ 
resented locally by these six configurations: 



An example configuration on an open 16 x 16 lattice is de¬ 
picted in Fig. [2] The model has one one-to-one correspon¬ 
dence to the six-vertex model as well as the counting of al¬ 
ternating sign matrices, and is integrable (solvable) via Bethe 
ansatz techniques. The number of permissible loop configu¬ 
rations can be deduced by creating a “partition function”, to 
which the value 1 is added for every allowable configuration. 
The partition function can be expressed as a two-dimensional 
network of connected tensors. At each vertex, a 2x2x2x2 ten¬ 
sor T is placed and connected to it’s nearest neighbours. The 
value Tijki = 0 unless it corresponds to one of the six con¬ 
figurations depicted above, that is, Tnoo = Tiooi = 2ono = 
Toon = Tioio = Tcnoi = 1. The number of allowable con¬ 
figurations grows very fast in the lattice size. This calculation 
is selected because of its simplicity and accessibility to a wide 
audience, and because its numerical evaluation is moderately 
challenging. 

Results: In Fig. [3] I plot the results for the calculation 
of the partition function (or number of loop configurations), 
compared to the quasi-exact result extracted from a boundary - 
MPS with large bond dimension. Here I have studied a modest 


16 x 16 lattice with open boundary conditions, which allows 
for appoximately 7.05 x 10 56 configurations. While boundary - 
MPS calculations using the standard truncation scheme con¬ 
tinually underestimates the partition function, the Tensor Net¬ 
work Monte Carlo estimates fluctuate about the exact result 
(in this case, determined quasi-exactly by an MPS calculation 
with large bond dimension) independently of bond dimension. 

What does change with bond-dimension is the accuracy 
or precision of both simulations, illustrated in Fig. [4] The 
standard tensor network technique converges to the exact re¬ 
sult extremely rapidly as the bond dimension is increased; in 
fact, for D > 40 we have already saturated numerical pre¬ 
cision (larger D may be required for bigger systems or peri¬ 
odic boundary conditions). What we see clearly from Fig. [4] 
is that the TNMC method inherits similar behaviour — as 
the bond dimension increases, the sample-to-sample varia¬ 
tions decrease just as rapidly as the standard tensor network 
approach (although the standard deviation may be a small 
constant larger in magnitude than the variational error, which 
is the cost paid to obtain an unbiased simulation). This is 
an enormous improvement over standard, configuration-based 
Monte Carlo, where the sample-to-sample variance is fixed by 
the problem or physical system you are trying to solve. 

The TNMC method now gives the user two avenues to de¬ 
crease simulation error — increasing either bond dimension, 
or the number of samples. While performing N samples de¬ 
creases the overall error by a factor of 1/y/N, increasing D 
may result in a much more significant improvement (often 
super-polynomial). Nonetheless, there are many situations 
where there are insufficient computational resources to utilize 
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FIG. 4: Statistical and truncation error for the partition function of 
the six-vertex model, as in Fig. [3] Unlike standard Monte Carlo, a 
single tensor network Monte Carlo sample may account for the vast 
majority of the partition function, therefore reducing the statistical 
fluctuations. The standard deviation between TNMC samples paral¬ 
lels the behaviour of the traditional tensor network contraction, the 
error decaying extremely rapidly with respect to bond dimension. 


This well-known model exhibits a phase transition from a 
disordered phase (randomly aligned spins) to ferromagnetic 
phase (all spins aligned in the same direction, on average) at 
a critical temperature T c ss 2.278. At this point Z becomes 
non-analytic in the thermodynamic limit, and the symmetry 
is broken for T < T c (the system may either be majority 
spin-up or majority spin-down). Here, I will study finite-sized 
L x L systems (L odd) with closed boundary conditions (the 
boundary consisting of all up spins), resulting in finite-size 
effects that smooth out the phase transition. The spontaneous 
magnetization M, as measured by the average magnetization 
M = sjj of the central spin c, is a measure of the order pa¬ 
rameter. I have chosen this configuration because of its con¬ 
venience with respect to the TNMC using boundary MPS, 
although periodic boundaries would result in smaller finite- 
size effects. The central spin is measured by contracting two 
boundary-MPSs, one from the top boundary and another in¬ 
dependently from the bottom boundary. The final contraction 
involves the central row surrounded by the two boundary MPS 
and does not involve renormalization of the spin being mea¬ 
sured , allowing us to easily measure both Z and the function 

Zm 


a large enough bond-dimension for accurate, trustworthy and 
unbiased variational results. In this case, the TNMC may be 
useful to remove the bias of standard tensor network simula¬ 
tions, which I investigated next. 

Removal of variational bias 

The model: Next we study the classical Ising model on 
a square lattice at zero magnetic field. At each vertex i of the 
lattice, a spin ,s, ; can take values ±1. The Hamiltonian is a 
sum of two-body contributions, where nearest neighbours on 
the lattice contribute energy — 1 if they align and +1 if they 
anti-align. Formally, the Hamiltonian is 

H=-jJ2 s > s T ( 9 ) 

<i,j> 

where <i,j> denote nearest neighbour pairs, i and j. With¬ 
out loss of generality, I take the coupling constant J = 1 
and Boltzmann’s constant kg = 1. At temperature T, the 
probability of a configuration s = [si, S 2 ,... ] is p( s) = 
exp(— H(s)/T)/Z, where Z is the partition function is the 
normalization constant of all the individual weights: 

Z = J2eM-H/T) = J2 II ex P(~ s i s j/T)- ( 10 ) 

s s <i t j> 

This result can be expressed as a two-dimensional tensor net¬ 
work in the form studied above, with rank-four tensors on the 
vertices of a square lattice (see the Appendix for further de¬ 
tails). 


Zm = ^ s c exp (-H/T) (11) 


which gives the spontaneous magnetization M = Zm IZ. In 
the Monte Carlo simulation, the numerator and denominator 
must be averaged before division, and because of this the jack¬ 
knife technique is used here to estimate the statistical error. It 
should be noted that wild fluctuations were observed when 
attempting to renormalize the measured spin, by including it 
in rows which are truncated in a simple top-to-bottom con¬ 
traction, as the strong correlation between Zm and Z is lost 
(resulting in single-sample estimates of M that may be well 
outside the “physical” region 0 < M < 1, even though aver¬ 
ages after many samples may be correct). 

Results: In Fig. [5j I plot the spontaneous magnetization 

of a 63 x 63, comparing boundary-MPS results with stan¬ 
dard truncation and TNMC using bond dimension D = 2 
to quasi-exact results (obtained with large bond dimension). 
The result is clear — the variational MPS technique overes¬ 
timates both the spontaneous magnetization and the tempera¬ 
ture of the phase transition, however the TNMC results fluc¬ 
tuate about the exact curve as expected. The variational result 
prefers the “simpler” ferromagnetic phase since it is some¬ 
how “easier” to capture with a small bond-dimension (here, 
the symmetry is explicitly broken by the boundary condition). 
This type of bias is a well-know problem with all variational 
techniques. On the other hand TNMC result allows for spatial 
fluctuations much like standard Monte Carlo, and has no pref¬ 
erence for one phase over the other. The tensor network bias 
has been completely removed. 
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Temperature T 

FIG. 5: The central magnetization of the classical 2D Ising model 
on a 63 x 63 lattice with closed boundary conditions as a function of 
temperature. The standard MPS method with bond dimension D = 2 
overestimates the magnetization and thus transition temperature due 
to the bias introduced by truncation. The TNMC with same bond 
dimension and 10 3 samples gives an unbiased result, however the 
statistical fluctuations are greatest in the same region where the bi¬ 
ased technique struggles. 

DISCUSSION 
Markov chain sampling 

The method above has relied on so-called “perfect sam¬ 
pling”, where each sample is drawn independently from the 
last. This has many benefits, in that it is simple to parallelize 
and the independence of samples makes statistical analysis 
straightforward. 

However, perfect sampling may also have some impor¬ 
tant limitations. The distribution of the estimates Z s is well- 
controlled for sufficiently large bond-dimension, but for small 
bond-dimensions and very large systems the statistics may be 
very broad and suffer from long tails. In this case, very un¬ 
likely samples may make very significant contributions to the 
partition function, and worse, the measured variance may not 
correspond to the true error. The observation made during the 
simulations presented here is that perfect sampling is effec¬ 
tive whenever the bond-dimension is large enough to capture 
a large fraction of the partition function (even for the vari¬ 
ational method). This becomes more problematic for larger 
systems because Z increases exponentially, not linearly, with 
system size, and a constant fraction of Z is necessarily lost 
at each truncation step. TNMC techniques at least have the 
possibility to overcome this limitation because their accuracy 
may also improve exponentially in the bond-dimension. On 
the other hand, standard Monte Carlo techniques almost never 
utilize perfect sampling because of the inherent difficulty in 
choosing relevant samples according to the partition function. 

The alternative to perfect sampling is Markov chain sam¬ 


pling, where the configuration is updated iteratively, and is 
preferred because it rapidly stabilizes to “likely” configura¬ 
tions according to the partition function. The Metropolis al¬ 
gorithm is perhaps the most well-known Markov chain sam¬ 
pling technique, where a new configuration s' is accepted with 
a probability that depends on the ratio of Z s i/Z s . 

A similar technique could be applied to TNMC, but must 
first be adapted to the multi-sampling that takes place at each 
projection. The tensor-network algorithm would work in 
a similar fashion to the so-called “second-renormalization”’ 
group (SRG) where the projectors are optimized iteratively. 
However, now the selected subspace would be chosen ac¬ 
cording to the product of their individual weights, similar to 
Eqns. and then individually rescaled according to their 
rates [Eq. ([8])]. One expects that this would improve both the 
quality of each sample (i.e. how much of the partition func¬ 
tion is accounted for in a single sample, similar to how the 
SRG improves over TRG) and the variance of observables, at 
the expense of some autocorrelation between samples. 


Other tensor network renormalization schemes 

The TNMC can be applied in a straightforward way to a 
variety of geometries using existing renormalization schemes. 
In the corner transfer matrix (CTM) and the various tensor 
renormalization group (TRG) techniques, the renormalization 
scheme proceeds by iteratively enlarging the course-grained 
blocks and truncating the bonds of the resulting tensor net¬ 
work. Instead of choosing the subspace that maximizes the 
fidelity of the representation of the blocks, one could choose 
them randomly according to their individual importance using 
the core of the TNMC algorithm presented here. 

One particularly exciting prospect would be the applica¬ 
tion to 3D partition functions using 3D TRG algorithms. The 
“higher-order” TRG (HOTRG) method in particular has been 
used to obtain very accurate results. Despite this, HOTRG 
is known to not account for all short-range correlations ac¬ 
cording to the ‘area law’ in three or higher dimensions (36), 
in analogy to the way that DMRG and tree tensor networks 
are unable to account for all the correlations of very large 2D 
quantum systems. 

One way of avoiding any doubts surrounding 3D HOTRG 
raised by this problem is by using TNMC to remove any po¬ 
tential bias, while the renormalization scheme itself is used 
to average a significant part of the partition function — even 
if it is not able to be account for an absolute majority of 
Z with a single sample, statistical variations of observables 
will be much smaller than in standard, configuration-based 
Monte Carlo, especially those observables measured in the 
well-renormalized regions of space. This development would 
be exciting because it would open the possibility of precise 
and unbiased calculations of challenging 3D classical, and 2D 
quantum, systems. 
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Quantum systems and real-time evolution 

Any (/-dimensional quantum statistical problem can be ex¬ 
pressed, at least approximately, as a (d + 1)-dimensional ten¬ 
sor network called the ‘path integral’ via a decomposition 
of the imaginary time evolution, exp (-H/T). Performing 
(quantum) Monte Carlo by sampling from the resulting ten¬ 
sor network is known as Path Integral Monte Carlo (PIMC). 
PIMC has been wildly successful in many cases, but suf¬ 
fers from the sign problem in higher-dimensional frustrated 
or fermionic systems. 

TNMC can improve upon PIMC in two ways. Firstly, even 
in the absence of a sign problem, TNMC will display much 
smaller statistical variations than traditional PIMC, and may 
be a much more economical method of generating results of 
a certain precision. Secondly, the HOTRG has been shown 
to mitigate the sign problem given a sufficiently large bond- 
dimension 03. allowing the summation of both positive and 
negative contributions to the partition function to obtain the 
correct result. Similarly, each TNMC sample generated via 
3D HOTRG would also sum over many of the positive and 
negative contributions, potentially eliminating the sign prob¬ 
lem entirely if the bond-dimension is sufficient. The Monte 
Carlo aspect allows us to parallelize work over many comput¬ 
ers and, by averaging many samples, obtain a higher precision 
than is possible with the variational HOTRG method. The 
possibility of creating a sign-problem resistant Monte Carlo 
to attack challenging 2D quantum problems is certainly entic¬ 
ing, and is an area of intense interest for the author. 

The extra ‘imaginary-time’ dimension could be replaced 
with real-time evolution for simulating unitary quantum dy¬ 
namics, the stochastic dynamics of classical systems, or some 
combination of both. It is expected that our ability to perform 
unitary quantum dynamics will be limited — ultimately by 
the sign problem and because correlations in time do not de¬ 
cay under unitary dynamics. The dynamics of open systems 
is interesting both theoretically and because of the relevance 
to modern physics experiments, and given that correlations do 
decay in time as the system relaxes to the steady state, the 
TNMC should work well. 

One could speculate further, and foresee continuous-time 
Tensor Network Monte Carlo techniques such as projector 
Monte Carlo, stochastic series expansion or diagrammatic 
Monte Carlo, or anticipate a connection between real-time 
evolution of MPS and the Monte Carlo trajectories of simpler 
variational wavefunctions, for instance, by the Positive-P sim¬ 
ulation method that uses the bosonic coherent state basis lf38l . 


CONCLUSION 

I have introduced an entire new class of calculation under 
the umbrella of Tensor Network Monte Carlo. TNMC is ca¬ 
pable of producing significantly less sample-to-sample vari¬ 
ance than standard Monte Carlo through what is essentially 
an aggressive multi-sampling technique that can account for 


the majority of the partition function in a single sample. Al¬ 
though this approach is based on methods for renormalizing 
tensor networks, TNMC exhibits none of the variational bias 
of those techniques. 

The core of the algorithm is modular by nature and can 
be readily retrofitted to other tensor network renormalization 
schemes beyond the boundary-MPS calculations presented 
here. The most obvious application would be to Tensor Renor¬ 
malization Group calculations for 2D and 3D partition func¬ 
tions. The method naturally applies to quantum systems via 
the path integral formulation, and can potentially by used to 
approach a huge array of problems in physics and beyond. It is 
natural to attempt the simulation of strongly-correlated, frus¬ 
trated or fermionic quantum problems in 2D — and ask if we 
can achieve the ‘holy grail’ of quantum Monte Carlo by per¬ 
forming calculations which are normally plagued by the sign 
problem. 
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APPENDIX 


Sampling basis 


The basis selected to perform sampling is selected in the 
same fashion as the standard truncation approach, via putting 
the boundary state in Schmidt form, which we rewrite. 

R 

IMPS) =^ Ci |Li)® |i?,) (12) 

2—1 

The left Schmidt vectors | Li) and also the right Schmidt vec¬ 
tors form orthonormal bases, (L,;|L ;j ) = 8 i3 = (R, \Rj). 

Getting the MPS into this form is relatively straightforward. 
Beginning on the left half of the chain, one uses the QR- 
decomposition on the first tensor, A\, such that A\ = Q\R\. 



where R is an upper-triangular (possibly rectangular) matrix 
while Q i is an isometric operator QiQ\ = I : 



(14) 


For the same purpose one could utilize the singular-value 
decomposition (SVD) but the Q/(-decomposition is numeri- 
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cally less demanding. At the next step we perform the QR- 
decomposition on R 1 A 2 '. 


where: 



(15) 


(16) 


and so-on, until we reach the desired bond i with generated 
R,. We repeat the procedure from the right, using the re¬ 
lated ‘/.(^-decomposition' where L is lower-triangular and Q 
is again isometric (the LQ-decomposition of M is easily re¬ 
lated to the Q//-decomposition of il/ T via the transpose op¬ 
eration). The process is repeated until we find Li+ 1 . 

We are left with an MPS in a form with all the MPS tensors 
in either left- or right-isometric form, depending on if they 
are to the left or right of the bond we wish to truncate. On this 
bond, we have an additional matrix RiLi + 1 , which we can 
decompose according the the SVD: 


where the S t are in descending order. Typically, we observe 
the distribution of singular values decays quickly — often ex¬ 
ponentially — meaning the approximation error is small. In 
the TNMC, we use this basis since it results in a small expec¬ 
tation value of the 2-norm error (i.e. the error averaged over 
many samples). It should be noted that using this basis is not 
a necessary requirement of TNMC — strictly speaking, any 
basis, even overcomplete bases, can be used along with any 
sampling probabilities so long as Eq. 0 is reproduced. 

After the sample is chosen, we move onto another bond 
and repeat. In practice, all bonds can be truncated with a sin¬ 
gle round of Q//-decompositions to the right followed by the 
SVD sweeping to the left, truncating with independent Monte 
Carlo samples at each stage. 


Sampling algorithm 

In a single truncation step of TNMC, a sample refers to 
the simultaneous selection of some subset of size D drawn 
from R possibilities. Let us refer to that subset as the vector 
i = [Zi, * 2 ,..., Id\ where each element labels a selected basis 
vector. The unnormalized weight of the sample is 


R,L l+1 = USV ., 


(17) 


where U and V are unitary and S is diagonal and non¬ 
negative. The matrix U can be absorbed by the tensor to the 
left, without changing its isometric nature, and similarly V 
can be absorbed to the right. We are then left with the follow¬ 
ing form of the MPS, written for 6 sites. 



(18) 

where the left half forms an orthonormal basis due to the iso¬ 
metric nature of the tensors 



Sij 


= S. 


*7 


(19) 


( 20 ) 


D 

w(h,i 2 ,--.,iD) = ( 22 ) 

3=1 

while the normalized probability is 

p(i) = w{i)/^2w(i). (23) 

i 

This selection can be seen as a kind of multisampling , 
where we imagine independently choosing D samples accord¬ 
ing to their relative weights, Sf. However, we must addi¬ 
tionally apply the constraint that no index is selected more 
than once. Such a constraint can easily be implemented as a 
matrix-product operator (MPO). 

We begin with a product distribution, where each index i 
contributes weight Sf to the product if selected, and 1 other¬ 
wise. This is simply represented by the vector = [1, Sf], 
where we start indexing at zero (v/^ = 1, wf'~ > = Sf). To 
this we then apply the constraint MPO, pictured as 


(24) 



Finally, we truncate the bond by choosing to keep only a 
subset of the singular values (Schmidt vectors). In standard 
MPS truncation, we choose the largest D of the // singular 
values, resulting in an error according to the 2-norm as 


The tensors in the constraint MPO are simply given by 

A s , s ',i,j = S s ,s'<i>i+s,j , where A is labelled: 

s 


D 


IMPS) - |MPS) “ = S i 


( 21 ) 


i=D' +1 




(25) 
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At the ends, we cap the MPO with vectors 1 = [1,0, 0,... ] 
to select that we “begin” with zero selections, and r = 
[0,..., 0,1] denoting that we “end” with D selections. 

The product of the MPO and the product state results in an 
MPS, which can be sampled from easily with total cost DR 
by taking advantage of the fact that A is very sparse. To do 
this, one sequentially samples whether an index is selected or 
not. Starting with the leftmost site, we take its probability 
distribution by tracing over the sites to the right by summing 
with the vector 1 = [1,1], as shown in this diagram: 


(26) 

After generating a sample with probability p(si) which is ra¬ 
tio of weights of selecting or not selecting the first state (leav¬ 
ing the site with fully determined probability distributions for 
‘no’ n = [1, 0] or for ‘yes’ y = [0,1]), we move on to the sec¬ 
ond site to determine it’s conditional probability p(s 2 |si). The 
process is repeated. For example, after three steps, sampling 
‘yes’, ‘yes’, ‘no’ we end up with the following calculation of 
p(s 4 |si,S 2 ,s 3 ): 




At the end we get the total probability distribution that we 
wanted, 



FIG. 6: The sampling distribution for singular values Sf = 2“\ 
when selecting 50 of 100 possible Schmidt vectors (i.e. D = 100 
and D' = 50). With high probability the largest are selected, while 
the tails of the distribution are sampled only occasionally. 


constrained total particle number D and appropriate energies 

-k B T\og(Sf). 

The above steps can be performed in three simple ‘sweeps’ 
of the matrix-product probability distribution. Care must 
be taken to avoid overflow and underflow, even of double¬ 
precision numbers, as the product in Eq. (22 1 can grow very 
large or small even with moderate D. Code for perform¬ 
ing this algorithm in the Julia programming language is pro¬ 
vided freely and is available for download from github at 
http://github.com/andyferris/sampling. 


The Ising model as a tensor network state 


p(si)p(s 2 |si)p(s 3 |si, S 2 ) • ■ • p(sd|si, ■ • ■, Sfl-i) 

= p(s 1 ,...,s D ). (28) 

The final thing that is required is the probability of selec¬ 
tion of any individual Schmidt vector, p(si). This is achieved 
simply by summing over all other configurations, like inp(si) 
above. Below we calculate p(s 4 ). 



Each individual selected vector i must be multiplied by the in¬ 
verse of its appearance rate, 1 /p(si), in order to create an un¬ 
biased Monte Carlo technique and to satisfy Eq. 0- In Fig. [6] 
we show the sampling distribution p(s, ) for a distribution of 
singular values Sf = 2~ l . One notes that the sampling dis¬ 
tribution looks very similar to the Fermi-Dirac distribution — 
this is precisely because it is the Fermi-Dirac distribution with 


The thermal states of classical statistical models can gener- 
ically be written in a tensor product form. We specialize here 
to the case of 2-body Hamiltonians acting on a square lattice, 

H(s) = ^ H s i, s j)> (30) 

<i,j> 

where s = [si, s 2 , ■ ■ • ] denotes a particular configuration. We 
wish to calculate the probability distribution 


P( s) = 


exp(-7T(s)/fc s T) 


ex P (E<zj> Sj)/k B T 


exp (h( Si , Sj)/k B T ) 


(31) 


and the related partition function Z 


exp (h(si,Sj)/k B T) (32) 

s <i,j> 
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FIG. 7: The partition function of a statistical system on an open 
square lattice with nearest-neighbour two-body interactions. 


To simplify the notation, we write the 2-body transfer matrix, 

T(s u s 2 ) = exp(h(si,Sj)/k B T), (33) 

such that 

z= e n (34 > 

s <i,j> 

For the Ising model in Eq. ([9]), the transfer matrix is 

, , T exp (J/k B T) exp (-J/k B T) 1 

[exp (~J/k B T) exp(J/k B T) J ^ 

To express Z as a tensor network, it is convenient to intro¬ 
duce a generalized Kronecker-d function (also called a ‘copy 
tensor’) 


^ i t j,k ,... — 3i,j&i,k ■ • • > (36) 

which is zero unless i = j = k - .... It is drawn as a simple 
dot in tensor network diagrams, like 

j 

k — $ijkl 

l (37) 

This lets us easily ‘stitch’ together the transfer matrices 
onto whatever lattice we like. The partition function on a 
square lattice is shown in Fig. [7] To convert this to a stan¬ 
dard ‘vertex’ model, we make the following identification 


(38) 

which leads to the structure in Fig. |T](a), given the appropriate 
modifications to the tensors on the edges and corners. Extra 





bonds can be ‘opened up’ at lattice sites for the purpose of 
measuring local observables, such as the magnetization in the 
centre of the lattice. 
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